07. Quiz: Q-Learning
Quiz: Q-Learning
Say that an agent is learning to navigate the gridworld described earlier in the lesson.

Gridworld Example
Suppose the agent is using Q-Learning in its search for the optimal policy, with \alpha=0.1 .
At the end of the 99th episode, the Q-table has the following values:

Q-table
Say that at the beginning of the 100th episode, the agent starts in state 1 and selects action right . As a result, it receives reward -1 , and the next state is state 2 .

Beginning of the 100th episode
In the previous video, you learned that at this point in time, the agent updates the Q-table.